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1 .  Introduction 

One  large  portion  of  psychology  is  segmented  —  even  as  cnapter 
headings  in  many  tents  —  into  such  topics  as  sensation,  motivation, 
simple  selective  learning,  reaction  time,  etc.  Unlike  another 
broad  class  of  psychological  problems  (e.g.,  memory,  thinking  and 
perception),  these  areas  have  a  common  theme:  choice.  To  be  sure, 
in  the  study  of  sensation  the  choice  is  between  stimuli,  in  learning 
it  is  between  responses,  and  in  motivation  between  alternatives 
having  different  preference  evaluations;  and  some  psychologists 
believe  that  these  distinctions,  at  least  the  one  between  stimulus 
and  response,  are  basic  to  an  understanding  of  behavior.  This  paper 
will  attempt  a  mathematical  description  of  individual  choice  behavior 
where  the  distinction  is  not  made  except  in  uhe  language  used  in 
different  interpretations  of  the  theory.  Thus.  I  shall  use  the  more 
neutral  word  "alternative"  to  include  the  several  cases.  There  seems 

1.  This  work  was  supported  in  part  by  a  grant  from  the  National 
Science  Foundation  to  Columbia  University  for  the  study  of 
"The  mathematics  of  imperfect  discrimination",  and  in  part  by  an 
Office  of  Naval  Research  contract  on  basic  research  with  the  Department 
of  Mathematical  Statistics,  Columbia  University.  Reproduction  in  whole 
or  in  part  for  any  purpose  of  the  United  States  Government  is  permitted. 

I  wish  to  express  my  deep  appreciation  to  Professors  Robert 
R-  Bush  and  Eugene  H,  Galanter  for  their  many  helpful  discussions 
of  these  problems  while  the  work  was  in  progress  and  for  their 
numerous  constructive  suggestions  about  the  manuscript. 
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little  point  in  trying  to  discuss  the  merits  and  demOi_,s  of  this 
decision  now,  except  to  mention  it  in  order  to  avoid  confusion  later. 

The  results  that  follow  —  which  I  believe  afford  some  insight  into, 
and  some  integration  of,  utility  theory,  psychological  and  psychophysical 
scaling,  and  learning  theory  —  will  implicitly  serve  as  the  argument 
for  the  course  taken. 

A  basic  presupposition  of  the  paper  is  that  choice  behavior 
is  probabilistic,  not  algebraic.  This  is  a  commonplace  in  psychology, 
but  it  is  a  comparatively  new  and  unproven  point  of  view  in  economics 
(utility  theory).  To  be  sure,  economists  when  pressed  will  admit  that 
the  psychologist's  assumption  is  probably  the  more  accurate,  but  they 
have  argued  that  the  resulting  simplicity  warrants  an  algebraic 
idealization.  Ironically,  so^e  of  the  following  results  suggest  that, 
on  the  conti ary,  the  idealj tion  may  actually  have  made  their  problem 
artificially  difficult. 

Once  the  probabilistic  nature  of  choice  behavior  is  admitted, 
a  problem  arises  which  does  not  exist  in  the  algebraic  models.  Complete 
data  as  to  what  choices  a  person  will  make  from  each  possible  pair 
of  alternatives  do  not  appear  to  detf  rmine  what  choice  he  will  make 
when  there  are  three  or  more  alternatives  from  which  to  choose. 

Because  they  cannot  escape  multiple  alternative  choice  prcblems, 
economists  have  been  particula:  ly  sensitive  to  this  feature  of  probabi¬ 
listic  models  and  it  has  undoubtedly  been  one  of  the  sources  of 
resistance  in  their  admitting  imperfect  discrimination.  Early 
psychologists,  particularly  learning  theorists  experimentally  studied 
multiple  alternatives,  but  as  the  data  seemed  dreadfully  complicated 
a  trend  set  in  toward  fewer  and  fewer  alternatives  until  now.  most 
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studies  are  conducted  on  sijnple  T  mazes.  One  can  say  that,  for  the  most 
part,  present  day  psychologists  have  been  willing  to  ignore  —  or,  to  be 
more  accurate,  to  bypass  and  pontoons  —  the  connections  between 
pairwise  choices  and  mors  general  ones.  And  so  the  relationships 
have  remained  obscure. 

I  shall  centei  my  attention  upon  this  problem.  The  method 
of  attack  is  to  introduce  a  single  axiom  relating  the  various  probabilities 
of  choices  from  different  finite  sets  of  alternatives.  It  is  a  simple 
and,  I  feel,  intuitively  compelling  axiom  that  appears  to  illuminate 
many  of  the  more  traditional  problems,  in  particular  the  question 
as  to  whether  or  not  a  comparatively  unique  numerical  scale  exists 
and  reflects  choice  behavior.  Such  a  scale,  unique  except  for  its  unit, 
will  be  shown  to  exist  very  generally.  It  appears  to  be  the  formal 
counterpart  of  the  intuitive  idea  of  utility  (or  value)  in  economics, 
of  incentive  value  in  motivation,  of  subjective  sensation  in 
psychophysics,  and  of  response  strength  in  learning  theory. 

2.  The  Basic  Axiom 

Throughout  the  caper  I  shall  suppose  that  a  universal  set  U 
is  given  which  is  to  be  interpreted  as  the  universe  of  possible 
alternatives  (stimuli  or  responses).  In  practice,  U  will  have  to 
possess  a  certain  homogeneity:  the  decision  maker  will  have  to  be  able 
to  evaluate  the  elements  of  U  accord!/  g  to  some  comparative  dimension 
and  to  be  able  to  sei.ct  from  certain  finite  subsets  the  elements 
that  he  thinks  are  superior  (or  inferior,  or  distinguished  in  some  way) 
along  that  dimension.  For  example,  in  economics  U  may  be  taken  to  be 
a  finite  set  of  goods  among  which  a  person  can  express  preferences; 
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in  psychophysics,  it  may  be  the  infinite  set  of  possible  sound  energies 
(at  a  fixed  frequency)  which  a  subject  can  be  asked  to  evaluate  as 
to  loudness;  or,  in  learning  theory,  U  may  be  the  set  of  alternative 
responses  available  to  the  organism.  Note  that  U  may  be  finite 
or  infinite . 

In  general,  a  subject  is  not  asked  to  make  a  choice  from  the 
whole  of  U,  but  rather  from  some  finite  subset.  In  a  great  many 
experiments  only  two  stimuli  are  presented  to  the  subject  at  a  time 
and  he  is  required  to  chocse  the  one  he:  prefers,  or  the  one  he  deems 
louder,  etc.  Of  course,  larger  subsets  could  be  used,  although 
for  the  most  part  they  have  not  been,  and  certainly  most  daily 
decisions  are  from  larger  subsets  (e.g.,  the  choice  of  a  meal  from 
a  menu  or  the  choice  between  several  jobs,  etc.). 

Let  T  be  a  finite  subset  of  U  and  suppose  that  the  choice 
of  an  element  must  be  confined  to  T.  If  S  is  a  subset  of  T,  let  P(S;T) 
denote  the  probability  that  the  selected  element  is  a  m«  iber  of  the 
subset  S  when  the  choice  is  restricted  to  T.  These  probabilities 
will  be  the  basic  ingredients  of  the  following  theory.  It  should  be 
emphasized  that  P(S;T)  denotes  the  probability  that  the  single  chosen 
element  lies  in  S,  not  the  probability  that  S  is  chosen.  If,  however, 

S  *  (x*,  i.e.,  S  is  the  single  element  set  having  x  as  its  only 
member,  then  P(-X'jT)  denotes  the  probability  that  x  is  selected 
when  T  is  offered.  I  shall  abbreviate  this  to  P(x,*T).  If  T  =  |x,yj. 
then  P(x;|x,y|)  is  written  simply  as  P(x,y)  and  it  denotes  the 
probability  that  x  is  chosen  over  y  when  only  x  and  y  are  offered. 

The  svmbol  P(x,x)  will  be  admitted  as  meaning  P(x;^x,y|)  where  y  -  x, 
not  P(x;Jx|  ),  and  by  convention  it  will  always  have  the  value  l/2. 


2 

A  bit  more  formally,  we  shall  suppose  that  for  certain  finite 
subsets  T  of  U  a  probability  measure  is  given  on  the  subsets  of  T 
which,  by  the  axioms  of  probability  theory,  satisfies: 

i.  for  SCT,  0  <  P(S;T)  <  1, 

ii.  p(TjT)  »  1, 

iii.  if  R,  SCT  and  RC\S  =  (}),  then 

P(RUS$T)  -  P(RjT)  +  P(SjT). 

Repeated  application  of  iii  implies  that 

P(S;T)  -2  P(xjT). 
xeS 

Note,  given  our  interpretation  of  these  probabilities,  part  ii  means 
that  the  subject  is  forced  to  make  a  choice:  the  probability  is  1 
that  his  choice  is  in  T  when  only  T  is  available. 

The  axiom  that  we  shall  explore  is  this: 

Axiom  1.  If  T  is  finite  and  RCSCT,  then  P(R;T)  -  P(RjS  )P(S;T ). 

At  first  glance,  the  axiom  may  seem  to  be  a  tautology,  but 
it  is  not;  and,  at  a  second  glance,  it  may  seem  unlikely.  So  some 
discussion  is  needed.  Of  course,  it  would  be  a  tautology  if  the 
quantity  P(RjS)  were  replaced  by  the  conditional  probability  that  the 
choice  is  in  R  when  it  was  made  from  T  and  is  known  to  lie  in  S.  But 
P(RjS)  is  only  the  probability  that  the  choice  is  in  R  when  it  must 
be  made  from  S,  not  from  T.  Thus,  the  axiom  has  content.  As  an  example, 
suppose  that  T  is  the  set  of  entrees  on  a  certain  menu,  S  the  set  of 
meat  dishes,  and  R  the  set  of  beef  dishes.  Then  P(RjT)  denotes  the 
probability  that  the  chosen  entree  is  a  beef  dish  when  the  whole  menu 

2.  The  restriction  to  finite  sets  is  not  basic,  but  as  that  is  all 
that  is  needed  for  most  of  the  following  results,  and  as  the  theory 
is  easier  to  state  for  that  cas^  I  shall  assume  them  to  be  finite 
except  in  section  8. 
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is  presented}  P(S;T)  the  probability  that  it  is  a  meat  dish  when  the 
whole  menu  is  available;  and  P(R;S)  the  probability  that  it  is  a  beef 
dish  when  only  the  meat  dishes  are  presented  (as  sometimes  happens  late 
in  the  evening  when  various  items  have  run  out). 

In  decision  theory  (see,  for  example,  [8])  one  axiom?tic  idea 
is  recurrent:  the  independence  of  irrelevant  alternatives.  It  is  captured 
differently  depending  upon  the  context,  but  the  common  nature  of  these 
axioms  is  clear.  Intuitive3.y,  the  idea  is  that  if  you  are  to  make  a 
choice  from  a  set  of  alternatives,  then  the  addition  of  new  alternatives 
which  are  clearly  inferior  to  your  original  choice  should  not  cause  you 
to  alter  that  choice.  In  a  slightly  more  subtle  form,  this  is  the  content 
of  axiom  1.  It  says  that  if,  for  whatever  reason,  attention  has  been 
confined  to  S,  then  the  existence  of  the  alternatives  in  T-S  cannot 
influence  the  probability  that  the  choice  is  in  R. 

I  shall  postpone  any  discussion  of  the  empirical  plausibility 
of  this  axiom  until  we  know  some  of  its  consequences  (see  section  11 ). 

Before  presenting  any  results,  let  me  recast  the  axiom  in  several 
alternate  forms.  Define 

Q(SjT)  -  l-P(SjT). 

In  our  interpretation,  Q(S;T)  is  the  probability  that  the  chosen  element 
is  not  in  S  when  T  is  the  set  of  possible  choices. 

Axiom  1».  If  T  is  finite,  R,S  C.T,  and  RftS  -  <]),  then  Q(RyS;T )=Q(R;T-S )Q(S;T ). 

Axiom  1».  If  T  is  finite,  R,SCT,  and  ROS  **  <j),  then  P(RjT)  -  P(RjT-S )Q(S;T). 

Axiom  l111 .  If  T  is  finite,  x,y  e  T,  and  x  e  y,  then  P(x:T)  =  P(x;T-{y).)Q(y;T ). 

The  last  axiom  appears  to  be  a  good  deal  weaker  than  the  other 

three,  and  some  people  find  it  easier  to  accept;  however,  our  first 
result  shows  that  all  four  are  really  the  same. 
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Thsorem  1.  Axioms  1,  1*,  l",  and  1'"  are  equivalent . 


Proof.  1  implies  l1.  Suppose  R,SCT  and  ROS  * 

<j).  Let  S'  *  T-S. 

Since  RC  S 1 , 

P(RjT)  *=  P(RjS'  )P(S';T) 

(axiom  l) 

-  P(R?S»  )[l-P(T-S 1  jT  )J 

(ii  and  iii). 

-  P(R,-S'  )[l-P(SjT)] 

(def.  of  S'  ) 

*  P(Rj-S'  )Q(S;T) 

(def.  of  Q).  (1) 

Consider, 

Q(R'JS;T)  -  l-P(RUSjT) 

(def.  of  Q) 

-  1-P(R;T )-P(SjT  ) 

(iii) 

*  q(S;T)-P(R;S'  )q(SjT) 

(def.  of  Q  and  eq.  1) 

«  Q(SjT)[l-P(R?T-S)2 

(def.  of  S'  ) 

•  Q(S;T)Q(R;T-S) 

(def.  of  Q). 

1'  implies  1".  Consider, 

P(R;T-S)Q(S;T)  -  Q(S;T)~Q(R,-T-S)Q(S;?) 

(def.  of  Q) 

«  Q(S;T)-Q(R1JS;T) 

(axiom  1' ) 

-  P(R{JS]T  )-P(S;T) 

(def.  of  Q) 

=  P(RjT) 

(iii). 

1"  implies  1.  Let  RCSCT  and  call  S'  =  T-S,  then 

P(RjT)  *  P(R$T-S'  )Q(S'jT) 

(axiom  1") 

-  P(R;S)[1-P(T-£;T )] 

(def.  of  S'  and  Q) 

*  P(RjS )P(S;T ) 

(ii  and  iii  ) , 

1"  implies  1'".  Let  R  =  ^x  ^  and  S  «  {y}. 

1,M  implies  1” .  We  prove  this  by  induction  on  |s| 

.  Suppose  S  =  (yj, 

then 

P(RjT)  *  l  P(x;T) 
xeR 

(iii) 

*  £  P(x;T-{y \)Q(yjT ) 
xeR 

(axiom  1”') 

“  P(RjT-{y})Q(yjT) 

(iii  ). 
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Let  Rf\S  *  ij)  and  suppose  the  assertion  is  true  for  all  sets  having  |s|-l 
or  fewer  elements.  Let  y  s  S.  Clearly,  =*  4>,  so 

P(RjT)  «  FtR}(T-S)U{y}18(S-{y}jT)  (induction  hypothesis) 

“  P(RjT-S)Q[yj(T-S)U(yiIiQ(S“{y)iT)  (axiom  1'"). 

However, 

Q[yj(T-S)U{y}lQ(S-(y);T)  -  (l-P[y;(T-3)g{yj  ])}Q(S-{y}  {T)  (def.  of  Q) 

*  Q(S-{y] jT)-P(yjT)  (induction  hypothesis) 

-  l-P(S-[y}jT)-P(yjT)  (def.  of  Q) 

-  l-P(SjT)  (iii) 

*  Q(SjT)  (def.  of  Q>. 

Throughout  the  rest  of  the' paper  I  will  refer  just  to  axiom  1, 

but  will  use  whichever  of  the  four  forms  is  convenient. 


The  next  theorem  is  basic  to  all  later  results.  It  states, 
in  effect,  that  if  axiom  1  holds,  then  the  pairwise  probabilities 
determine  all  others. 

Theorem  2.  Suppose  TCU  is  finite  and  that,  for  all  RCSCT,  P(R;S ) 
is  defined.  If  axiom  1  holds,  then  for  every  x  e  S, 


Proof ,  If  3  has  only  one  element,  the  theorem  is  trivial,  so  we 
assume  S  has  two  or  more  elements.  By  axiom  1  and  property  iii 
of  probability, 

P(xjS)  «  P(x;[x,y})P({x,y^S) 

■  P(x,y  )[P(xjS )  +  P(yjS )), 
for  y  e  S-{x},  Rewriting, 

P(y,x)P(x;S)  **■  (l-P(x,y)]P(x;S) 


=  P(x,y  )P(yjS ) . 


(2) 
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There  are  two  cases  to  be  considered. 

1.  For  some  y  e  S- {x},  P(x,y)  B  U.  Since  P(y,x)  =  1-P(x,y)  =  1, 
this  together  with  eq.  2  implies  that  P(xjS )  *=  0,  and  so  the  assertion 
is  true. 


2.  For  all  y  s  S-.(x},  P(x,y)  /  0.  We  show  P(x;S )  ^  0.  Suppose, 

on  the  rmtrary,  P(x;S)  *  0,  then  by  eq.  2  P(yjS)  -  0  for  all  y  s  S-£xj. 

Thus,  P(S;S)  =■  Z  P(y;S )  =  0,  which  is  impossible  by  property  ii  of 
ye  S 

the  probability  axioms,  so  P(x;S)  ^'0.  This  means  that  eq.  2  can  be 
rewritten  as 


Ply,*)  =  P(y;S) 
P(x,y )  P(x;S) 


(3) 


for  y  e  S-{xj.  But,  by  definition  of  P(x,x),  eq.  3  also  holds  for  y  =  x,  so 


1 


2 


P(x,y) 


XX 


1 _ 

Hv  S) 

y^rr 


P(x;S ) 

21  p(yiS) 

yeS" 


*  P(x;3). 


3.  Existence  of  a  Numerical  Scale 

As  the  study  of  choice  behavior  has  developed,  both  in  psychology 
and  in  economics,  one  of  the  central  problems  that  a  formal  characterization 
must  solve  is  what  conditions  insure  the  existence  of  a  relatively 
unique  numerical  scale  that,  in  some  sense,  represents  the  choice  behavior 
of  the  subjects.  Mathematically,  the  problem  is  simply  one  of  imposing 
sufficient  axiomatic  structure  to  prove  the  existence  of  a  scale  which 
is  unique  up  to  some  group  of  transformations  — -  the  group  of  positive 
linear  transformations  (zero  and  unit  unspecified)  has  i.s- tally  been 
deemed  to  be  just  acceptable.  These  are  what  Stevens  [DJ  calls  interval 
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scales.  But  the  empirical  side-condition  that  these  mathematical 
assumptions  must  form  a  more  or  less  plausible  description  of  human 
and  animal  choice  behavior  has  rendered  the  problem  difficult.  There 
have  been,  I  should  say,  three  main  approaches. 

Economics.  Preference  among  goods  has  been  taken  to  be  the 

underlying  primitive,  which,  as  an  idealization,  has  been  assumed  to  be 

an  algebraic  ordering  of  the  goods .  So  long  as  a  finite  set  of  goods 

forms  the  set  of  alternatives,  many  numerical  order  preserving  scales 

exist,  but  their  uniqueness  properties  are  completely  inadequate. 

That  being  so,  economists  finally  arrived  at  the  position  that  it  is 

safer  to  work  only  with  orderings  —  as  they  say,  with  ordinal  utilities 

3 

m  contrast  to  cardinal  ones  —  and  for  many  of  the  traditional 
theorems  of  economics  this  i3  sufficient.  Nonetheless,  some  work, 
particularly  in  modern  decision  theory,  requires  cardinal  utility  scales. 
Some  extension  of  the  traditional  formulation  was  needed,  and  a  decade 
ago  it  was  effected  by  vori  Neumann  and  Morgenstern  [lB],  (Actually, 
Ramsey  [11]  suggested  much  the  same  idea  a  good  deal  earlier,  but  the 
importance  of  his  work  was  largely  missed  until  very  recently. ) 

Roughly,  one  continues  to  suppose  that  preferences  are  algebraic,  but 
the  domain  of  choice  is  extended  from  a  finite  set  of  goods  to  the 
infinite  set  of  all  possible  gambles  that  can  be  generated  from  the 
goods  and  an  infinite  set  of  chance  events.  Preference  over  these 
gambles  is  assumed  to  meet  certain  fairly  rigid  axioms  which,  although 
normatively  compelling,  seem,  at  best,  to  lack  detailed  descriptive 
realism.  Under  these  conditions,  a  scale  is  shown  to  exist  which  is 
unique  up  to  positive  linear  transformations  and  which  has  the  important 
property  that  the  utility  cf  a  gamble  is  equal  to  the  expected  utility 
of  its  components. 

3.  Psychologists  would  call  this  an' interval  scale  of  utii.il,/-. 
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Psychophysics  .  The  psychologist  has  been  largely  unwilling  to 
make  the  economist's  algebraic  idealization,  for  in  some  measure  the 
substance  of  his  choice  problem  resides  in  the  fact  that  people  are 
unable  to  make  consistent  discriminations.  The  early  psychophysicists 
proposed  to  use  these  data  as  a  means  of  scaling  subjective  sensation. 
Ultimately  I  shall  want  to  discuss  this  question  more  fully,  mainly 
because  recent  workers  have  tended  to  reject  the  earlier  ideas,  but  here 
I  shall  only  point  to  the  fact  that  the  attempt  was  made  and  that  analytical 
methods  were  presented  for  determining  an  interval  scale  whenever  certain 
consistencj.es  are  exhibited  by  the  data.  Mathematically,  the 
uniqueness  of  these  scales  results  in  large  part  from  the  assumption 
that  the  set  being  scaled  is  a  continuum  —  a  reasonable  assumption  for 
such  dimensions  as  sound  energy,  or  weight,  or  length,  etc.  For  a  modern 
discussion  of  this  mathematics,  see  [71. 

Psychometrics.  In  the  remainder  of  psychology,  a  small  group 
of  workers,  often  referred  to  as  psychometricians,  have  been  concerned 
with  scaling  objects  other  than  the  traditional  sensory  stimuli. 

In  particular,  such  concepts  as  attitude,  preference,  intelligence,  and 
interest  have  concerned  them.  Their  problem  has  been  similar  to  that 
confronted  by  the  economists  in  that  their  sets  of  alternatives  are 
decidedly  finite.  Thus,  the  continuous  approximation  of  the  psycho¬ 
physicist  was  out,  and  the  gambles  of  the  utility  theorist  —  which, 
in  any  event,  are  of  dubious  realism  in  many  psychological  contexts  — 
were  not  thought  of.  The  resolution  arrived  at  during  the  second  and 
third  decades  of  this  century  was  roughly  this.  The  by  then  somewhat 
tarnished^ psychophysical  assumption  was  taken  over  that  the  underlying 
scale  has  the  property  that  it  makes  discrimination  uniform  throughout 
the  scale.  Since  the  continuum  assumption  could  not  he  transferred, 
this  was  quite  insufficient  to  lead  to  a  unique  scale.  Other  assumptions 
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had  to  be  added.  At  the  time,  statistics  was  rapidly  becoming  the 
somewhat  overworked  handmaiden  of  psychology,  and  normality  and 
independence  assumptions  were  in  the  wind.  With  little  justification 
beyond  convenience  and  need,  these  were  freely  introduced  until  finally 
adequate  uniqueness  was  achieved.  The  result:  an  extensive  and 
unsightly  literature  which  has  been  largely  ignored  by  outsiders,  who 
have  correctly  condemned  the  ad  hoc  nature  of  the  assumptions. 

In  the  other  areas  of  choice  behavior,  specifically  motivation 
and  learning,  it  has  been  generally  assumed  that  scaling  or  measurement 
is  either  irrelevant  or  can  be  indefinitely  postponed.  Among  the 
exceptional  sorties  are  the  papers  of  Hull  et  al  [5]  and  Young  [19]. 
However,  to  one  familiar  with  measurement  ideas,  the  notions  of  incentive 
value  and  response  strength  are  suggestive  of  scales. 

In  all  of  the  fields  where  scales  have  been  important,  they 
have  been  constructed  under  the  assumption  that  only  data  for  pairs  of 
stimuli  are  known.  In  the  economic  models  this  has  not  been  a  limitation 
because  of  their  algebraic  nature  (in  particular,  the  assumed  transitivity 
of  preference).  In  the  psychological  models,  where  discrimination  is 
admittedly  not  perfect,  the  pairwise  data  have  not  been  known  to  determine 
choices  from  larger  sets  and  the  whole  problem  has  remained  unresolved. 

As  we  have  seen  (theorem  2),  axiom  1,  if  accepted,  justifies  complacency 
on  that  score . 

What  I  propose  to  show  in  this  section  is  that  if  choice 
discrimination  is  admitted  to  be  imperfect  and  if  axiom  1  is  assumed, 
then  a  scale,  which  is  unique  except  for  its  unit,  is  determined  for 
both  finite  and  infinite  sets  of  alternatives.^  As  we  shall  see,  this 
formulation  solves  all  of  the  classical  problems  in  a  very  simple  way. 


1*.  In  Stevens1  terminology,  a  ratio  scale  will  be  shown  to  exi-'t. 
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Lemma  1.  Suppose  T  “  {x,y,z}CU  and  that,  for  all  RCSCT,  P(R;S )  Is 
defined,  axiom  1  holds,  and  P(s,t)  /  0  or  1  for  s,t  e  T.  Then, 


So ,  by  theorem  2, 


If  v  and  v*  are  two  such  functions,  choose  k  so  that 

Z  v(y)  =  k  Z  v'(y)  .  Since  v  >  0  and  v*  >  0,  k>0.  Then,  for  any  x*S 
yeS  yeS 


Z  v'(y) 
yeS 

kv1 (x) 

k  Z  v'  (y1) 
yeS 

kvs  (x) 

2  v(y) 
yeS 

v(x) 

2  v(y) 
yeS 


So,  v(x)  =  kv'(x),  for  xeS,  which  shows  that  v  is  unique  up  to 
multiplication  by  a  positive  constant. 

The  limitation  that  P(s,t)  ^  0  or  1  may  seem  severe  in  that  it 
makes  the  scale  v  extremely  local j  however,  in  general  this  should 
prove  to  be  no  obstacle.  One  can  suppose  (at  least  sometimes,  but 
section  10  suggests  not  always)  that  none  of  the  probabilities  are 
actually  0  and  1,  but  rhther  that  some  are  very  small  or  very  large 
and  seem  to  be  0  or  1  because  the  number  of  observations  is  finite. 

Can  it  be  that  these  exceptional  inversions  are  the  ones  we  try  to  explain 
away  by  saving  that  we  were  not  "paying  attention",  or  we  were  "bored 
and  wanted  to  see  what  would  happen"-,  etc?  Nevcrl!  e^es  in  practice, 
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we  cannot  estimate  these  limiting  probabilities  sufficiently  accurately 
to  use  them  for  scaling  purposes.  It  appears  that,  to  construct  a 
scale  over  the  whole  of  U,  one  will  have  to  find  a  series  of  overlapping 
T's  which  span  U  in  such  a  way  that  within  each  i  the  pairwise 
probabilities  are  decidedly  larger  than  0  and  smaller  than  1.  Then 
theorem  3  can  be  applied  within  eadh  T,  and  the  arbitrary  scale  constants 

c 

can  be  chosen  so  that  the  separate  scales  match  in  the  regions  of  overlap. 

In  another  paper  [6J  I  introduced  the  following  definition 
of  an  algebraic  relation  which  approximates  the  pairwise  discrimination 
structure.  For  x,y  c  T,  x  >  y  if  and  only  if  P(x,z)  >  P(y,z),  for 
all  z  e  T.  The  relation  >  on  T  is  referred  to  as  the  trace  of  the 
pairwise  discrimination  structure.  It  is  easy  to  see  that  the  trace 
is  a  transitive  relation,  but  in  general  it  n^ed  not  be  a  weak  order. 

That  is,  there  may  exist  incomparable  pairs  (x,y)  for  which  a  and  z'  e  T 
can  be  found  such  that  P(x,z)  >  P(y,z)  and  P(x,z')  <  P(y,z').  In  [61 
I  found  it  necessary  to  suppose  that  this  did  not  occur  —  that  the 
trace  is  in  fact  a  weak  order.  The  following  theorem  gives  a 
sufficient  condition  for  this  to  be  so. 

Theorem  U.  Under  the  conditions  of  theorem  3>  the  trace  of  the  pairwise 
discriminations  forms  a  weak  order  and  v  is  order  preserving. 

Proof.  By  the  definition  of  the  trace,  x  £  y  is  equivalent  to 
P(x,z)  >  P(y,z),  for  zeT.  But  by  theorem  3,  this  is  equivalent  to 

$.  The  actual  details  of  how  this  should  best  be  done  will  very 
much  depend  npon  the  peculiar  empirical  difficulties  of  the  several 
areas,  and  I  do  not  intend  to  imply  that  a  single  general  method 
will  work.  More  experience  in  handling  such  problems  exists  in 
psychophysics  than  elsewhere,  and  some  hint  of  one  method  used  there 
is  given  in  section  5. 
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1  Tun 


1  + 


v(z) 

vT77 


which,  by  simple  algebra,  i3  equivalent  to  v(x)  >  v(y). 

Indeed,  we  have  shown  that  x  >  y  if  and  only  if  P(x,y)  >  l/2. 
This,  in  turn,  is  known  [Jj  to  be  equivalent  to  the  condition  of  strong 
stochastic  transitivity,  namely, 

if  P(x,y)  >  1/2  and  P(y,z)  >  l/2,  then  P(x,z)  >  max[P(x,y )P(y,z?)l. 


U.  Fechner^  Problem 

One  way  of  phrasing  theorem  3  is  that  axiom  1  is  sufficient 

to  make  the  discrimination  problem  mathematically  one -dimensional.  By 

this  I  mean  that  if  we  set  v(S )  =  v(y),  then  P(xjS)  depends  only 

y&b 

upon  the  ratio  of  v(x)  to  v(S).  The  one -dimensionality  is  particularly 
vivid  for  the  pairwise  discriminations  where 


P(x,y) 


4 


The  idea  that  discrimination  along  a  single  sensory  continuum 
mathematically 

might  be/one -dimensional  has  long  been  common  in  psychology.  It  was 
first  postulated  by  Fechner  in  psychophysics  and  it  has  been  widely 
assumed,  but  without  an  axiomatic  justification  such  as  I  have  given  here. 
As  Fechner' s  assumption  has  been  subject  to  a  good  deal  of  discussion 
and  controversy  in  psychology,  and  as  many  psychologists  now  reject  what 
is  often  called  the  Fechnerian  position,  I  should  like  to  examine  what 
is  involved  in  some  detail. 

It  iff  generally  held  that  Fechner  assumed  tliat  the 
subjective  sensation  of  intensity  arising  from  physical  stimuli  which  form 
a  continuum  is  given  by  that  transformation  of  the  physical  continuum 
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which  renders  dlscriminatj on  dependent  only  upon  sensation  differences.^ 
This  is  now  believed  on  empirical  grounds  to  have  been  wrong  (see 
Stevens  [15J).  It  seems  to  me  that  whether  or  not  his  assumption  can  be 
rejected  greatly  depends  upon  exactly  what  it  is,  and  about  this  there  is 
some  confusion.  As  I  see  it,  there  are  two  quite  distinct  parts  to  it: 

i.  the  probabilities  of  pair-wise  discriminations,  the  P(x,y), 
are  so  constrained  that  there  exists  a  real-valued  mapping 
u  of  the  stimuli  and  a  function  F  of  one  real  variable  such 
that, for  P(x,y)  ^  0  or  1, 

P(x,y)  *  F  [u(x)-u(y )], 

and 

ii.  the  function  u  of  part  i  represents  "subjective  sensation". 

Now,  although  part  i  must  be  true  for  part  ii  to  have  any 
meaning  at  all,  the  truth  or  falsity  of  part  ii,  however  we  may  choose  to 
interpret  it,  asserts  nothing  at  all  about  the  truth  or  falsity  of  part  i. 
This  simple  point  seems  to  have  been  slurred  over  a  good  deal  in  the 
discussions  of  Fechner's  assumptions ). 

Psychologists  have  interpreted  part  ii  as  implying  various 
reasonable  things  about  behavior,  and  these  implications  have  turned  out 
to  be  empirically  false.  For  example,  let  x  and  y  be  two  soft  tones 
and  x'  and  y'  two  loud  tones,  all  of  the  same  frequency,  and  such  that 
u(x)-u(v)  *  u(x')-u(y').  It  is  argued  that  if  u  really  represents 
subjective  sensation,  the  two  differences  should  seem  to  be  of  the  same 
size  to  the  subject;  they  do  not.  For  such  reasons  the  Fechnerian 
position  has  been  rejected  —  not  just  part  ii,  but  also  part  i.  It  would 

6.  Host  often  psychologists  phrase  Fechner's  assumption  in  terms  of  the 
equality  of  sensation  jnds  and  refer  to  the  stated  postulate  as  the 
principle  that  "equally  often  noticed  differences  are  equal,  unless 
always  or  never  noticed."  Of  course,  the  jnd  concept  is  actually  an 
algebraic  construct  from  statistical  data,  and  it  ib  not  surprising 
to  find  that  the  two  are  actually  the  same  assumpti on .  A  fill  di  uuiosion 
of  this  point  will  be  found  in  [7j. 
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appear  that  part  i  should  be  dealt  with  separately  and,  if  true,  retained, 
for  the  reduction  of  a  multi-dimensional  problem  to  a  one -dimensional  one 
is  an  achievement  not  to  be  lightly  discarded. 

Part  of  the  reason  for  rejecting  part  i  as  well  as  part  ii, 
even  though  the  evidence  does  not  force  one  to  do  so,  is  no  doubt  the  fact 


that  that  restriction  is  difficult  to  accept  as  a  primitive  axiom. 

Somehow  it  is  much  too  sophisticated  and  not  sufficiently  compelling  to  be 
treated  other  than  as  an  interesting  conjecture.  What  has  been  lacking 


in  a 


uj  u  v  um  x  x  Will 


it  would  appear  as  a  consequence  • 


In  axiom  1,  however,  we  have  a  condition  that  is  sufficient 


to  prove  Fechner's  assumption  i  when  discrimination  is  imperfect,  and 
to  do  so  quite  generally  without  restricting  U  to  be  a  continuum. 

This  is  easily  seen  by  setting 

u  =  log  v 


in  which  case  theorem  3  implies 


PU,y) 


T 


e 


_ 1 _ 

1  +  e-[u(x)-u(y)l! 


For  obvious  reasons,  I  shall  refer  to  log  v  as  the  Fechnerian  scale. 

It  appears  that  v  is  a  much  more  basic  scale  than  Fechner's. 
For  example,  v  enters  in  theorem  3  in  a  particularly  simple  way,  making 
the  calculation  of  P(xjS)  almost  trivial.  In  some  of  the  following 
applications,  v  appears  to  play  a  more  central  role  than  log  v. 
Nonetheless,  if  axiom  1  holds,  Fechner  was  correct  in  th^  f^rst  half 
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<  of  his  assumption,  though  he  need  not  have  confined  his  conjecture  to 

stimuli  which  form  continua. 

Recently,  Stevens  [15]  has  argued  that,  although  it  is  true 
that  discrimination  is  mathematically  one  dimensional,  it  depends  upon 
ratios  of  scale  values,  not  differences  as  assum-d  by  Fechner.  This  is, 
of  course,  what  we  have  shown  must  hold  for  the  v  scale]  in  addition, 
the  results  in  the  next  section  show  other  strong  correspondences 
between  our  scale  and  the  one  Stevens  has  discussed. 

If  T  am  correct  -in  my  feeling  that  v  is  a  basic  scale,  then 
it  will  need  a  name.  There  are  a  multitude  of  possibilities  in  the 
literature,  among  them  response  strength,  sensation,  value,  and  preference, 
but  each  is  associated  with  a  particular  area  of  application  and  so 
would  seem  to  limit  the  generality  of  the  scale.  But,  as  my  wife  pointed 
out,  the  initial  letters  of  these  names  form  a  happy  combination,  and  so 
I  shall  dub  v  the  RSVP-scale. 

5.  Application  to  Psychophysics 

One  of  the  most  important  applications  of  this  theory  to 
psychophysics  was  given  in  the  preceding  section]  however,  as  Fechner’ s 
problem  is  not  confined  to  that  case,  I  chose  to  treat  it  separately. 

There  are,  in  addition,  two  other  topics. 

If,  with  the  psychologists,  we  reject  Fechner's  second  assumption 
that  log  v  represents  subjective  sensation,  then  wiiat  does?  Recently, 
Stevens  [15]  and  Stevens  and  Galanter  [16]  have  reviewed  a  large 
aggregate  of  data  which,  in  part,  seems  to  show  that  there  are  two 
quite  distinct  types  of  psychophysical  scales.  "Class  I  seems  to  include, 
among  other  things,  those  continue  on  which  discrimination  is  mediated 
by  an  additive  or  prophetic  process  at  the  physiological  level.  An 
example  is  loudness,  where  we  progress  along  the  cnitr'n  r  m  }  ;  i.i'iug 
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excitation  to  excitation.  Class  II  includes  continua  on  which 
discrimination  is  mediated  by  a  physiological  process  which  is  substitutive, 
or  metathetic.  An  example  is  pitch,  where  we  progress  along  the  continuum 
by  substituting  excitation  for  excitation,  i.e.,  by  changing  the  locus 
of  excitation."  [l£,  p.  3]- 

In  addition  to  this  distinction  by  mechanism,  there  seem  to  be 
sharp  behavioral  differences  between  the  two  classes  of  scales.  The 
properties  of  Class  I  seem  to  be  more  consistent  and  more  thoroughly 
explored.  Fur  these,  discrimination  (pairwise)  is  approximately 
proportional  to  physical  intensity  (Weber's  law),  or  more  precisely 
(see  Miller  [9])  it  is  linear  with  intensity.  Further,  when  a  person 
is  asked  to  assign  numbers  to  stimuli  so  that  the  numbers  correspond 
to  subjective  magnitudes  (magnitude  estimation),  the  data  can  be  fitted 
quite  accurately  by  a  power  function  where  (3  is  a  constant  between 
0.3  and  2.0  depending  upon  the  continuum  and  provided  that  it  is  measured 
in  ordinary  physical  units  (see  Stevens  [  I  S'! ) . 

Let  us  suppose  that  axiom  1  holds  and  that  the  R3VP~scale  v 
is  a  continuous  function  of  physical  intensity  and  that  Class  I  continua 
are  characterized  by  the  property  that  the  linear  generalization  of 
Weber's  law  is  true,  i.e„,  given  any  number  k,  1  >  k  >  1/2,  there  exist 
numbers  c(k)  and  d(k)  such  that 

P(x,y)  *  k  if  and  only  If  x  *=  [1  *  c(k)]y  +  d(k). 

Then  we  show  that 

v(k )  =  x[x  +  't  J^, 

—  £  k  “  ■  >  and  *  ■=  •  Since,  by  theorem  3 

log[  1  +  c(k)J  c(k ) 

v(x) _ j 

v(x )  +  v(y ) 


where  -<  >  0,  j3 


P(x,y) 
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the  generalization  of  $eber's  law  can  bo  written 

v  f[l  +  c(k )]y  +  d(k)'V  =•  — - —  v(y). 

L  J  1  -  k 

3y  slightly  modifying  the  results  in  [7],  one  can  show  that  the  solution 
to  this  equation  is  unique  up  to  multiplication  by  a  positive  constant, 
and  it  is  easy  to  show  by  substitution  that  the  above  v  is  this  solution. 

As  far  as  mathematical  form  is  concerned,  the  model  leads 
to  the  correct  result  for  Glass  I  continua;  however,  the  exponent  p 
appears  to  be  from  one  to  two  0  ders  of  magnitude  larger  than  that 
obtained  by  direct  methods.  Stevens  [1 5]  reports  p  -  0.3  for  loudness 
when  y  is  measured  in  energy  units.  In  a  study  of  loudness  discrimination 
of  white  noise  (the results  are  fair3.y  similar  to  those  for  pure  tones), 
Miller  [9]  employed  a  technique  in  which  the  base  stimulus  was  always 
present  and  periodically  an  increment  of  energy  was  added.  He  reports 
that  for  middle  and  high  intensities,  the  Weber  fraction  (similar 
to  c(k)  above)  corresponding  to  30  per  cent  correct  reports  is  0.099 
when  y  is  measured  in  energy  units.  These  data  are  not  of  the  form 
needed  in  this  model,  for  Miller  did  not  use  a  forced  choice  technique 
—  a  failure  to  report  an  increment  added  is  really  an  indifference 
report.  If  we  suppose  that  in  a  forced  choice  situation  half  of  these 
indifference  reports  would  go  one  way  and  half  the  other  —  this  is  not 
strictly  true,  but,  as  will  be  shown,  this  will  not  affect  the  qualitative 
nature  of  the  calculation  —  then  k  =*  0.73  and  c(k)  =  0.099.  Substituting 
in  the  above  formula  yields  p  =  11.6.  Even  if  we  took  k  as  small  as 
0.6,  our  formula  for  p  would  yield  ij,3,  which  is  an  order  of  magnitude 
larger  than  Steven's  constant.  I  am  quite  uncertain  as  to  how  this 
discrepancy  should  be  interpreted,  but,  as  there  can  be  no  doubt  that  it 
is  not  an  error  of  measurement,  it  bears  same  invrv  t.i 
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One  test  of  this  model,  which  has  not  been  available  for  earlier 
ones,  is  its  prediction  of  the  form  of  the  discrimination  functions. 

Once  p  is  determined  from  k  and  c(k),  we  predict  that 

P(x,y)  e 


It  is  unclear  exactly  how  the  scales  of  Class  II  can  be 
characterized,  except  that  Weber's  law  does  not  hold  and  that  magnitude 
estimation  Hoes  not  yield  a  power  law.  Or.cc,  howuvsr,  a  law  of  discrimi¬ 
nation  is  given  of  the  form 

P(x,y)  «■  k  if  and  only  if  x  =  g(y), 
theorem  3  and  the  analystic  procedure  given  in  [7]  can  be  used  to 


determine  the  RSVP-scale.  For  example,  if  there  are  any  continua 
such  that  discrimination  is  independent  of  the  physical  value,  i.e., 
P(x,y)  =  k  if  and  only  if  x  =  y  +  c(k),  then  it  is  easy  to  show  that 
v(y)  =  peXy  , 


where  p  >  0  and 


^  _  log  k  -  iog(l-k) 

x  cur — 


A  second  application  of  this  theory  is  to  Thur stone's  [17] 
concept  of  a  "discriminal  process",  which  he  introduced  both  as  a  possible 
explanation  of  imperfect  discrimination  and  to  arrive  at  certain 
mathematical  relationships  which  might  be  observed.  In  its  simplest 
form,  the  idea  is  to  attach  a  density  function  f(x,t)  to  each  stimulus 
value  x  and  to  interpret  it  as  characterizing  the  stimulus  that  the 
subject  "thought"  was  administered.  Thus,  for  example,  when  x  and  y 
are  two  sound  energies,  one  supposes  that  an  observation  is  drawn  from 
f(x,t)  and  another  from  f(y,t)  and  wldchever  is  the  larger  determines 
the  stimulus  that  the  subject  calls  louder.  Note,  we  are  assuming 
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independent  observations .  The  model  generalizes  to  any  finite  number 
of  stimuli.  To  be  more  formal,  let 

F(x,t)  =  /  f(x,T)dT. 

The  assumption  is  that 

P(xjS)  *=  /  f(x,t)  TT  F(y,t)dt.  (1|) 

- 00  yeS-[x] 

This  is  the  probability  that,  when  3  is  the  set  of  stimuli  presented, 
x  will  be  reported  as  loudest  (or,  more  generally,  largest  on  whatever 
dimension  is  being  investigated).  A  similar  expression  can  be  written 
for  the  probability,  say  Pfc(x;S),  that  x  is  reported  least  loud,  namely, 

Pr(xjS)  3  /  f(x,t)  TT  [1-F(y,t)]dt.  (5) 

- «  yeS-£c) 

The  next  theorem  establishes  that  Thurstone's  assumption  of 
independent  discriminal  process  is  inconsistent  with  the  assumption  that 
both  P  and  P»  satisfy  axiom  1. 


Theorem  5>.  Suppose  that  T  =  ^.x,y,z^  is  a  subset  of  the  set  U  of  positive 
real  numbers  such  that,  for  all  R  CS  CT,  P(RjS )  and  P1  (R;S )  aro  dofined 
and  P(s,t)  and  F> (s,t)  ^  0  or  1,  for  s,t  e  T.  If  P  and  F1  both  satisfy 
axiom  1,  then  there  do  not  exist  density  functions  f(s,t)  for  each  s  e  T 
and  t  e  such  that  eqs.  Ii  and  $  hold  for  all  SCI* 


Proof.  Suppose  the  theoium  is  false,  then 

P(x;T)-P'(x;T)  =  /*f (x,t )  {F(y,t)F(z,t )  -  [1-F(y,t )] (1-F(z, t )]  } dt 
=  /°<,f(x,t)[F(y,t)  +  F(z,T )  -  l]dt 
*  P(x,y )  +  P(x,z )  -  1. 


By  theorem  2, 

P(x;T)-Pi(x;T) 


1  + 


1 

1-P(x,y )  t  1-P(x,zl 
P(x,y)  P(x,z ) 


1  + 


1 

P(x,y) 
1-P(x,y ) 


4 


P(x,Z  ) 
l-PTx/zT 


■p(x,y  )+P(x,z ) 


-1]  f 


P(x,y)+P(x,z  )-2P(x,y  )P(x,  z ) 


V 


[  2-P(x,y  )-P(x,z  )+?(x,y  )P(x,  z  )][!-!  r:,  )l 


A 
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Equating  these  two  expressions,  the  term  in  braces  must  be  1,  and  that  is 
easily  reduced  to  the  condition 

[1-P(x,y  )][l-P(x,z  )][2-P(x,y)P(x,z  )}  *  0. 

This  can  be  satisfied  only  if  either  P(x,y)  =  1  or  P(x,z)  &  1  or  both, 
which  is  contrary  to  hypothesis. 

6.  Application  to  General  Psychometric  Problems 

The  most  important  implication  of  this  theory  for  psychometric 
scaling  is  contained  in  theorem  3«  So  long  as  axiom  1  is  satisfied, 
any  sot  of  alternatives  has  a  numerical  scale  which  is  unique  except  for 
its  unit.  In  particular,  there  is  no  need  for  the  usual  ad  hoc  normality 
and  independence  assumptions,  and  there  is  no  basic  difference  between 
scaling  a  finite  set  of  alternatives  ani  scaling  a  psychophysical  continuum. 

In  addition,  there  are  two  somewhat  technical  points  that  may  be 
of  methodological  interest.  A  subject  is  sometimes  asked  to  rank  order 
stimuli  according  to  some  dimension  instead  of  simply  selecting  the  "largest” 
or  the  "smallest"  stimulus.  For  our  purpose,  it  is  sufficient  to  suppose 
that  he  is  asked  to  select  his  first  and  second  choice  and  to  indicate 
their  ordering.  Although  it  does  not  follow  directly  from  axiom  1, 
the  intuitive  basis  of  that  axiom  would  suggest  that 

s  R(x,y;T)  *  P(x;T )P(yjT-{x  j)  (6) 

gives  the  probability  that  x  is  the  first  choice  and  y  the  second. 

It  is  easy  to  see  by  axiom  1  that  this  is  equivalent  to 

R(x,y;T)  »  P(yjT-{x})-P(y;T ).  (7) 

If,  as  before,  P-*  refers  to  the  probability  of  choosing  the  'Smallest" 
stimuli,  then  R'(x,y;T)  can  be  defined  similarly  and  it  is  interpreted 
as  the  probability  that  x  is  placed  last  and  y  next  to  last  when  such 


choices  must  be  made  from  T. 
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Observe  that  when  T  =  {x,y,z}  these  two  operations  are 
tantamount  to  rank  ordering  the  stimuli,  and  so  one  might  suspect  that 
R(x,y;T)  and  R’  (z,y;T)  would  be  equal.  They  are  not  generally,  as 
we  shall  now  show. 

Theorem  6.  Suppose  that  T  =  (x,y,z^  and  that,  for  all  RCSCT, 

P(R;S)  and  P'  (R;S )  are  defined  and  satisfy  axiom  1.  A  necessary  and 
sufficient  condition  for 

P(xjT)P(y,z)  ®P*’  (zjT  )P(x,y) 

po  r>  +■  pfv  ^ 

Proof.  Replace  P(x;T)  and  F>  (z;T)  by  the  expressions  given  in 
theorem  2  and  simplify. 

In  words,  if  a  person  satisfies  axiom  1  and  if  the  probability 
of  a  ranking  is  given  by  eq.  6,  then  in  general  it  matters  whether 
he  begins  the  ranking  at  the  top  or  the  bottom.  This  may  not  be  unrelated 
to  the  fairly  widespread,  but  so  far  as  I  know  undocumented,  impression 
that  most  people  exhibit  a  characteristic  direction  of  ordering, 
usually  from  the  top  down. 

Our  second  point  revolves  around  suggested  devices  for 
empirically  estimating  P(x,y).  The  difficulty  in  making  such  estimates 
for  many  stimuli  is  that  if  the  pair  (x,y)  is  presented  several  times, 
one  suspects  that  the  first  response  is  remembered  and  colors  the 
answers  to  the  later  presentations.  In  other  words,  the  first  response 
alters  the  Plx,y)  governing  the  later  responses.  The  problem,  then, 
is  to  devise  dodges  which  allow  us  to  estimate  P(x,y)  without  actually 
presenting  the  simple  choice  between  x  and  y  more  than  cnce.  One 
suggestion,  which  I  first  saw  in  [2],  is  to  have  the  subject  rank  order 
several  finite  subsets  of  U.  In  [2],  subsets  of  four  elements  were 
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were  used.  These  sets  can  be  chosen  so  that  each  pair  of  alternatives 
(x,y)  appears  a  number  of  times,  and  an  obvious  estimate  of  P(x,y) 
is  the  number  of  times  that  x  is  ranked  superior  to  y  divided  by  the 
total  number  of  times  the  (x,y )  pair  appears.  The  following  theorem 
justifies  this  procedure  when  subsets  of  three  elements  are  used. 

Theorem  7.  Suppose  that  T  =  fx,y,z|  and  that,  for  all  RC.SCT, 

P(RjS)  is  defined  and  axiom  1  holds .  Then, 

P(x,y)  «*R(x,y;T)  *  R(x,zjT)  +  R(z,x;T). 

Proof.  By  eq.  6, 

R(x,y;T)  +  R(x,zjT)  «■  P(xjT )[P(y,z )  +  P(z,y)J, 

*  P(xjT). 

So,  by  eq.  7, 

P(x,y)  *  P(xjT— {z}) 

*  R(a,xjT)  +  P(xjT) 

-R(z.x-T)  +  R(x,y:T)  +  R(x,zjT). 

Unfortunately,  this  result  does  not  generalize  to  larger  subsets. 
Suppose  T  {w,x,y,z^}  and  define 

R(w,x,y,z)  «  P(w;T)P(xjT-^})P(y,z ), 
which  we  interpret  as  the  probability  of  the  rank  order  w,x,y,z.  By 
writing  down  all  the  R  expressions  in  which  x  precedes  y  and  simplifying, 
the  following  estimate  for  P(x,y)  results: 

P' (x,y)  *=P(x;T)  +  P(w;T)P(xjT-(w))  +  P(z jT )P(x;T-^z  }) 

+  P(x,y)[fy;T)P(zjT-(w))  +  P(zjT)P(wjT-{z))]. 

If  axiom  1  holds,  then  in  general F*(x,y)  f  P(x,y).  This  can  be  shown 
by  an  example.  Let 

P(x,y )  ■  0.5,  P(x,z)  =  0.7  P(x,w )  =  0.9 
P(y,z)  «  0.6,  P(y,w)  0.8,  P(z,w)  =  0.7. 


From  theorem  2,  one  calculates 


P*(x,y)  «  0.522  >  0.5  =  P(x,y). 

Since  the  discrepancy  in  this  example  (and  in  others  like  it) 
is  small,  P  may  actually  serve  as  a  fairly  good  estimate  of  P.  This 
question  needs  further  investigation. 


7 .  Application  to  Stochastic  Learning  Theory 

In  the  various  stochastic  models  of  learning  [!],  with  which 
I  must  assume  the  reader  is  familiar  at  least  in  broad  outline,  it  is 

annnwoH  +  Vi  n  +■  rir'rrfln'?  *-»m  ■}  n  +  r%*-*+-*  r-1  K-rf  +  *>»  «-!  1  ^ 
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set  T  of  alternatives,  say  T  *  {l,2, . ,i,...rj.  His  choice  on  trial  n 

is  assumed  to  be  determined  by  a  probability  distribution  which  we  may 
denote  by  jVn(i;T)^j  ,  and  as  a  result  of  his  choice  an  environmental  event 
—  an  outcome  —  occurs  which,  if  you  please,  can  be  thought  of  as 
rewarding  or  punishing  the  organism,  thereby  altering  the  probabilities 
which  determine  his  choice  on  the  next  trial.  We  need  not  specify  this 
any  mere  fully  than  to  say  that  for  each  alternative-outcome  pair 
there  is  assumed  to  be  an  operator  which  transforms  the  probability 
distribution  into  ^Pn+  ^ ^ i T ^  •  The  form  oP  the  operator 

will,  in  general,  depend  upon  both  the  choice  and  the  outcome,  but  it 
is  assumed  not  to  depend  upon  the  trial  number.  Or  put  another  way, 
it  does  not  depend  upon  the  previous  history  of  the  organism  except 
to  the  extent  that  the  history  is  summarized  by  the  probability 
distribution  on  trial  n.  This  is  known  as  the  "independence  of  path" 
assumption.  Given  that  it  is  so,  there  is  no  loss  of  generality  in 
suppressing  the  trial  number  n  and  simply  denoting  the  probabilities 
of  the  present  trial  by  P(ijT)  and  those  of  the  following  trial  by  P'(i;T)e 
Largely  for  reasons  of  mathematical  simplicity,  but  also  because 
of  a  rationale  given  by  Estes  (see  chapter  2  of  [1])  and  because  of 


the  combining  of  classes  condition  (see  chapter  1  of  [1]),  the  operators 
have  generally  been  assumed  to  be  linear  functions  of  P(ijT).  Indeed, 
they  can  be  written 

P'(i;T)  *  ^P(ijT)  *  (l-*)X  , 

where  4  does  not  depend  upon  i.  Most  of  the  mathematical  research  has 
been  devoted  to  determining  some  of  the  statistical  properties  of  this 
linear  model,  mainly  for  T's  having  two  elements,  and  applying  it  to 
learning  data. 
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concentrated  upon  its  questionable  ability  to  handle  certain  empirical 
phenomena,  and  only  to  a  lesser  extent  have  they  worried  about  its 
foundations.  It  has,  however,  been  argued  that  the  stimulus  conditioning 
rationale  for  the  linear  operators  is  none  too  convincing,  and  the 
combining  of  classes  condition  has  also  been  questioned.  To  me,  one 
of  the  most  frustrating  features  of  this  modern  approach  to  learning, 
as  distinct  from  some  of  the  earlier  theorizing,  has  been  the  apparent 
disparity  in  conception  between  it  and  the  models  that  have  been  created 
to  describe  static  choices,  namely  the  psychophysical  and  utility  models. 
Somehow,  if  there  is  in  fact  a  mathematical  structure  to  choice  behavior, 
one  is  inclined  to  suppose  that  there  should  be  something  in  common 
between  the  static  and  dynamic  theories. 

Intuitively,  one  connection  suggests  itself.  The  more 
traditional  learning  theorists  (among  others,  Hull.[Ul  and  Spence  [131) 
have  held  that  one  should  distinguish  between  the  strength  or  intensity 
of  a  response  and  the  observed  likelihood  of  that  response.  For  example, 
a  50-50  decision  between  two  objectionable  responses  is  not  exactly 
the  same  as  a  50-50  decision  between  two  desirable  ones.  (Miller  [10 j.). 

If  one  had  a  numerical  measure  of  the  strength  of  a  response,  then 
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this  distinction  might  afford  a  basis  for  reconciling  the  stochastic  models 
of  learning  and  the  ideas  of  psychophysical  scaling  and  utility  theory 
in  such  a  way  that  a  theory  of  the  type  desired  by  the  behavior  theorists 
would  result.  Although  the  behavior  theorists,  who  have  utilized  the 
concept  of  response  strength  quite  generally  and  amassed  an  impressive 
body  of  related  empirical  data,  have  attempted  to  cast  these  notions 
in  a  mathematical  framework,  no  really  satisfactory  axiomatization  has 
been  given. 

An  alternative  resolution  of  these  two  classes  of  theories 
is  suggested  by  theorem  3.  Since  in  a  learning  context  it  is  not 
unreasonable  to  suppose  that  an  alternative  may  be  suddenly  added  or 
dropped,  axiom  1  is  not  without  meaning.  Let  us  suppose  that  it  is 
satisfied.  Then,  by  theorem  3  we  know  that  the  RSVP-scale  v  exists 
and  that 


P(i;T) 


v(i)  . 

H  T(J) 


jeT 


It  is  clear  that  if  v  is  multiplied  by  a  positive  constant,  the 
probability  distribution  is  unchanged.  So,  if  we  are  willing  to  emphasize 
the  RS  of  the  RSVP-scale  and  to  identify  it  with  the  intuitive  idea 
of  response  strength,  the  overall  level  of  strength  can  change  without 
necessarily  altering  the  probability  distribution.  Of  course,  in  a 
static  model,  I  would  suggest  emphasizing  the  S,  V,  or  P  and  would  identify 
the  scale  with  sensation,  value,  or  preference,  as  the  case  may  be. 

Since  the  vector  of  RSVP-scale  values,  v  =  [v(l),  v(2 ), . . .v(r )1, 

Wv 

uniquely  determines  the  probability  distribution,  but  not  conversely, 
one  is  led  along  with  the  behavior  theorists  to  suspect  that  the  RSVP 
distribution  may  be  more  basic  and  that  the  learning  model  should  be 
phrased  in  terms  of  changes  in  the  RSVP-scale  which  indirectly  alters 
the  P  distributions.  Assuming  that  this  is  so  and  that  tl  e  independence 
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of  path  assumption  holds  for  the  v;s,  though  no  longer  necessarily  for 
the  P's,  then  a  particular  alternative-outcome  pair  will  effect  a  change 
that  can  be  represented  by  a  (vector)  operator  of  the  form 

v'  «*  f(v)  (8) 

There  are  two  important  constraints  upon  such  operators.  First, 
since  the  RSVP-scale  is  always  non-negative,  we  have  the 
Non-negativeness  condition 

f(v)  >  0,  for  all  v  >  0,  (9) 

—  w\  *x  — 

where  0  is  the  null  vector  with  r  components. 

Vvr. 

Second,  since  the  unit  for  the  RSVP-scale  is  not  determined 
(theorem  3),  it  does  not  seem  reasonable  to  permit  the  learning  operator 
to  depend  upon  the  arbitrary  unit  chosen  for  calculations;  hence 
we  impose  the 

Invariance  of  unit  condition 

f(kv)  =■  kf(v),  for  all  real  k  >  0  and  all  v  >  0  (10.^ 

**  —  -v* 

These  two  conditions  are  insufficient  to  narrow  the  operators 
down  to  the  point  where  it  would  be  feasible  to  try  to  analyze  data  with 
this  model,  and  so  others  must  be  added.  Two  directions  suggest 
themselves,  the  first  of  which  leads  finally  to  the  type  of  model  that 
has  been  studied  by  Bush,  Estes,  and  Mos teller.  If  we  impose  the 
Superposition  condition 

f(v  +  v#)  =  f(v)  +  f(v*),  for  all  v,  v*  >  0,  (ll) 

then  it  is  well  known  that  this  together  with  the  invariance  of  unit 
condition  constitutes  the  definition  of  a  linear  transformation  in  a 


vector  space  and  that  to  each  such  there  corresponds  a  matrix  a^ 


such  that 


v'(i)  *=  2L^  a  v(j) 
j  =  i  ij 
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By  the  non-negativeness  condition  (eq.  9),  a^  >  0  for  i,j  t  T. 

Observe  that,  in  general,  P'(ijT)  cannot  be  expressed  as  a 
linear  combination  of  the  P(j;T);  however,  if  [a^I  satisfies  the 
Constant  column  sum  condition 


then 


r 

H  a.  -a,  for  all  j  €  T, 
1 1  *,  1 3 


r 

Z  a  v(j) 

P'(ijT)  =  - ^ — — - 

r  r 

7.  7  a  rr(  -j  ) 

i=-l^j=l  ij 


(12) 


r 

Z  a  v(j ) 

.  3-1  IJ 

av(T) 

r 

Z  hi  P(j;T) 
j“l  a 

The  model  defined  by  these  four  conditions  will  be  referred 
to  as  the  Alpha  Model;  it  is  the  one  that  has  been  discussed  uite  fully 
by  Bush  and  Mosteller  in  their  book.  To  illustrate  the  relationship 
among  parameters,  let  r  =  2,  then 

P‘(l,2)  -  *<P(l,2 )  -  (l-*)\, 

where 

•<  “  (a11"a12  )/a  and  X  *  ar12^al2  *  a2l'* 

Mathematically,  the  alpha  model  has  the  interesting  feature  that  it  is 
linear  and  satisfies  the  independence  of  path  assumption  both  at  the 
level  of  the  RSVP-scale  and  at  the  level  of  the  probability  distribution,, 
The  first  of  the  two  special  conditions  —  superposition  — 
is  familiar  and  requires  no  discussion  here.  The  second  seems  rather 
special,  but  it  is  needed  if  the  change  in  the  probibili t /  distri juti^n 
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is  to  be  expressed  simply  as  a  linear  combination  of  the  probabilities 
on  the  preceding  trial.  At  the  level  of  the  RSVP-scale  it  has  the 
foxjowing  effect :  eoch  operator  respite  in  a  v* -vector  the  sum  of  whose 
components  is  simply  the  constant  a  times  the  s-'  ..e  umi  neats  of 

the  v -vector. 

The  second  direction  that  one  can  take  is  interesting  because 
it  leads  to  some  new  possibilities  and  because,  to  my  mind,  the  assumptions 
needed  seem  fairly  plausible.  It  will  be  recalled  that  in  working  with 
the  alpha  model  Bush  and  Hosteller  impose  the  combining  classes  condition 
which  leads  to  the  result  that  P'(ijT)  depends  only  upon  P(ijT),  and  not 
upon  the  rest  of  the  distribution  (for  r  =  2,  this  is  trivially  true). 

This  conclusion  has  a  certain  intuitive  appeal  as  a  general  condition 
on  the  learning  operator,  especially  since  it  is  another  interpretation 
of  the  independence  of  irrelevant  alternatives  idea.  So,  we  introduce  the 
Independence  of  Irrelevant  Alternatives  Condition.  For  each 
alternative-outcome  pair,  the  learning  operator  f  shall  be  of  the  form 
v 1  ( 1 )  =fi(v(i)J.,  for  i  e  T.  (13) 

In  the  presence  of  this  condition,  the  non-negativeness  condition 
(eq.  9)  becomes 

f^[v(i)J  >  0,  for  all  v(i)  >  0,  (lU) 

and  the  invariance  of  unit  condition  (eq.  10)  becomes 

f^fkv(i)]  *  kf^[v(i)l.,  for  all  real  k  >  0  (l5) 

and  all  v(i)  >  0. 

Finally,  it  does  not  seem  unreasonable  to  impose  the 
Continuity  condition.  Each  f^  is  a  continuous  function  of  its  argument. 

I  shall  refer  to  the  learning  model  characterized  by  these  four 


conditions  (non-negativeness,  invariance  of  unit,  independence  of 
irrelevant  alternatives,  and  continuity)  as  the  beta  model.  It  is 
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well  known  that  the  only  continuous  solutions  for  eas.  lh  and  IS  are 
of  the  form 

fi[v(i)l  -  P^U), 

where  {3^  is  a  non-negative  constant.  Observe  that  >  1  effects  an 

increase  in  v(i)  and  <  1  a  decrease,  so  we  may  identify  these 

with  reward  and  non-reward  (or  punishment)  of  response  i,  if  we  so  desire. 

As  is  easily  seen,  the  beta  model  is  a  special  case  of  the 
matrix  model  (defined  by  non-negativeness,  invariance  of  unit  and 
superposition  conditions)  in  which  the  matrix  is  diagonal. 

An  operator  of  the  beta  model  yields  a  probability  distribution 
of  the  form 

V(i) 

P'(i;T)  =  - ^ -  » 

Y  Mj  > 

j  -  1 

which  we  see  cannot  generally  be  expressed  in  termB  of  the  P(j;T)  alone. 
There  is,  however,  an  important  special  case  where  some  simplification 
is  possible.  Intuitively,  it  seems  plausible  that  when  alternative  i 
is  chosen,  the  effect  of  the  outcome  is  to  change  v(i)  while  ^leaving 
the  other  v's  unaltered. 

Pi  -p 

Pj  *  1,  J  /  i- 

This  will  be  referred  to  as  the  simple  beta  model.  It  is  easy  to  see 
that  this  implies  that 

P'(ijT)  - 


P'(jjT)  = 


_ P(J;T) _ 

1  4  (p-l)p(jjTjZ|ij 


j  t  i- 


Although  in  general  the  probability  distribution  P1  still  does  not  depend 
upon  the  distribution  P  alone,  when  there  are  onl r  t  :o  i  L.  i-  atives, 
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say  i  =  1  and  j  c  2,  then  the  simple  beta  model  becomes 


P'(l,2) 


-  PP(1»2) 

1  +  ~[p~l)P(l,2) 


P1  (2,1) 


P(2,l) 

p  -  (£-l)P(2,l7 


1-P(l, 2 ) 


This  is  a  non-linear  operator  of  tne  type  studied  by  Bush,  Ester, 
Mosteller,  and  others . 

The  fact  that  the  two  alternative  operator  of  the  simple  beta 
model  is  non-linear  at  the  level  of  the  P's  should  not  cause  alarm,  for 
it  is  linear  at  the  level  of  the  v's.  Furthermore,  since  there  is  no 
additive  constant  as  in  the  usual  linear  operators  of  the  alpha  model, 
these  operators  commute,  i.e.,  the  order  in  which  they  are  applied  is 
immaterial.  For  some  calculations,  this  results  in  a  considerable 
simplification. 

The  real  question  now  is  whether  the  beta  model  can  account 
for  data  as  well  as  or  better  than  the  alpha  model,  and  whether  or  not 
there  are  some  learning  experiments  and  phenomena  that  it  can  handle 
which  had  previously  seemed  outside  the  scope  of  stochastic  learning 
theory.  These  are  definitely  open  problems.  At  the  time  of  writing,  onlv 
some  preliminary  calculations  on  the  simple  beta  model  have  been  carried 
out  (under  the  direction  of  Robert  R.  Bush).  They  were,  however, 
sufficiently  encouraging  to  make  us  undertake  more  detailed  calculations j 
they  will  be  reported  elsewhere.  Also,  only  a  little  thought  has  been 
given  to  phenomena  that  the  beta  model  might  be  able  to  treat  that  have 
seemed  to  be  outside  the  alpha  model.  This,  of  course,  is  a  possibility 
since  the  new  distribution  of  probabilities  will  generally  depend  upon 
the  RSVP-scale,  not  just  on  the  previous  probabilit  •> r  r  .  Tvn  ideas 
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immediately  come  to  mind.  First,  predictions  about  behavior  are  possible 
v/hen  the  set  of  alternatives  is  suddenly  changed,  as,  for  example,  when 
the  experimenter  blocks  a  passage  in  a  maze  after  a  certain  number  of 
trials.  Second,  people  have  conjectured  in  the  past  that  the  time  to 
reach  a  decision  is  highly  dependent  upon  response  strength,  which  suggests 
that  one  should  study  the  relationship  between  latencies  and  distributions 
of  RSVP-scale  values.  Such  a  study  is  in  the  planning  stage. 

8.  Application  to  Choice  Reaction  Time 

The  following  derivation  of  the  distribution  of  decision  latencies 
is  veil  known o  Denote  by  P(0,t)  the  probability  that  a  decision  initiated 
at  time  0  is  made  by  time  t,  and  set  Q(0,t)  =  1-P(0,t).  It  is  assumed 
that,  if  a  decision  has  not  been  reached  by  time  t,  the  probability 
that  it  is  reached  in  the  interval  from  t  to  t  + At,  i  here  At  is  small, 
is  given  by  \(t)^t.  Thus, 

Q(0, t  +At)  *  Q(0,t)[l-\(t)At], 

so,  rewriting, 

„  -\(t)Q(0,t ). 

At 

Taking  the  limit,  integrating,  and  entering  in  the  reasonable  initial 
condition  Q( 0,0)  -  1, 

Q(0,t)  =  exp  . 

Frequently  it  is  assumed  that 

X(t)  tOTt>tc 

0  for  t  <  t 

—  o 

For  reasons  that  are  not  entirely  clear  to  me,  I  have  always  1  een 
somewhat  uneasy  about  this  derivation  —  what  is  actually  assumed  does  not 
seem  to  be  explicit.  The  following  alternative  derivation  of  the  same 
result  suggests  that  axiom  1,  extended  a  bit,  lies  lcHr>  1  it. 


Let  us  assume  that  axiom  1  holds  for  all  (infinite,  as  well  as 
finite)  sets  RC.SCTCU  where  the  three  probabilities  P(RjT),  P(S;T), 
and  P(R;S)  are  defined.  In  particular,  suppose  that  U  denotes  the 
non-negative  reals  (tne  time  continuum)  and  that  P(R;T)  is  always  defined 
when  R  and  T  are  intervals.  Thus,  if  t  <  t  and 

R  =  [x|  t  <  x  <  tj.  *  [T,tJ 

S  =  [x|  0  <  x  <  xj  =  [0,t) 

T  «=•  [xj  0  <  x  <  oo ]  »• 

axiom  1  asserts  that 

Q([0,tl'{[0,«0))  ■  Q(  (t,  t};  [t,  oo  ))Q( [0,  oo  ) ), 

Assuming  that  Q( [x,t ] ; [x, oq ) )  is  differentiable  in  t  for  every  x  >  0, 
take  the  logarithm  of  this  expression  and  then  differentiate  with 


respect  to  t; 


^Q([0,tJ;[0,»o))  /  Q.([0,t]j[0,oc}) 
at  / 


&  Q([/t>th[T,oO  l)  i  Q(  [t, tl;  [t,  oc  ) ) 

at  ' 


Since  this  holds  for  every  t  >  0,  the  expression  is  only  a  function  of  t; 
call  it  -X(t ).  If  we  now  integrate,  we  obtain 

Q([T,t];[T,  a).)  «  exp  jj-/ _^X(x)dx  +  F(t)J, 
where  F  is  an  arbitrary  function.  However,  if  we  inpose  the  reasonable 
condition  that 

*o))  *  1, 

then  we  get 

Q(K,t]j[T,  oo  ))  -  exp  J-/Tt  X(x)dxJ. 

If,  in  addition,  we  assume  that 

Q([^,t]  j[n,oo))  “•  Q(  [0,t-rj  j[0,  oG)) 


then  it  is  easy  to  show  tnat  X(t)  is  a  constant. 
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9.  Application  to  Information  Theory 

The  primary  observation  that  I  want  to  make  in  this  section  is 
that  Shannon's  axiomatic  derivation  [ 1 2 1  of  entropy  only  makes  sense 
if  the  device  making  the  selections  satisfies  axiom  1.  This  means  that 
whenever  this  statistic  is  used  to  describe  animal  or  human  behavior, 
either  Shannon's  axiomatic  justification  is  implicitly  rejected  or 
axiom  1  is  implicitly  assumed. 

The  heart  of  the  matter  is  contained  in  Shannon's  third  axiom 
defining  entropy.  He  assumed  that  the  entropy  of  a  distribution  P(x;T), 
where  T  is  finite,  can  always  be  expressed  as  the  sum  of  two  quantities, 
namely: 

i.  the  entropy  of  the  set  T  in  which  any  subset  S  is  treated  as 

a  single  element  having  probability  P(S;T)  =  Z  P(xjT),  plus 

xeS 

Ai.  P(SjT)  times  the  entropy  of  the  set  S  with  the  distribution 
P(xjT )/P(S;T },  for  x  6  S. 

However,  if  we  are  discussing  selections  made  by  an  organism,  the 
probability  of  selections  from  S  are  independently  defined  as  P(x;S ), 
and  so  the  axiom  only  seems  to  make  sense  if  we  assume 

P(x;S )  =  P(x;T)/F(S;T). 

Rewriting  and  summing  over  all  x  e  R,  where  RC-  .<  we  obtain 

P(R;T)  *=  P(R;S)P(S;T), 

which  is  axiom  1. 

One  trivial  point  of  terminology.  By  theorem  3, 

H  =  -  Z  P(x;T)logP(x;T) 
xeT 

=  log  v(T)  -  xZT  P(x;T)log  v(x), 

so  the  entropy  of  the  distributioi.  P(x;T)  is  the  Fechnerian  value  of  T 
minus  the  expected  Fechnerian  value  of  its  elements. 
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10.  Application  to  Utility  Theory 

As  with  the  applications  to  psychological  scaling,  the  central 
consequence  for  utility  theory  is  contained  in  theorem  3-  a  utility 
function,  unique  except  for  its  unit,  exists  over  any  set  of  goods 
provided  that  preference  discrimination  among  the  goods  is  not  perfect 
and  that  axiom  1  is  met.  If,  as  may  be  generally  expected,  there  are 
cases  of  perfect  preference  discrimination,  the  scale  will  have  to  be 
pieced  together  in  the  manner  described  following  theorem  3*  This  may 
force  one  to  introduce  other  goods,  or  some  gambles  among  goods,  in  order 
that  a  sequence  of  overlapping  sets  exist  connecting  any  two  goods  such 
that  discrimination  is  imperfect  within  each  of  the  sets. 

In  the  remainder  of  this  section,  I  should  Like  to  begin  an 
examination  of  choice  behavior  among  gambles  in  terms  of  the  present 
theory.  The  problem  is  not  resolved,  end  the  lesults  show  that  it  is 
about  as  nasty  as  intuition  suggests  that  it  is. 

Historically,  gambles  were  introduced  into  the  utility  problem 
largely,  though  not  entirely,  to  arrive  at  a  cardinal  (interval)  utility 
scale,  and  as  we  now  know  they  were  not  really  necessary  if  imperfect 
discrimination  is  admitted.  Nonetheless,  the  gambles  generated  from 
a  set  of  goods  and  a  (finite  or  infinite)  set  of  events  do  form  a 
possible  set  of  alternatives  of  considerable  importance.  Especially 
significant  for  decision  theory  is  whether  a  utility  function  exists 
having  the  property  that  the  utili+y  of  a  gamble  equals  the  expected 
utility  of  its  components.  This  so-called  expected  utility  hypothesis 
follows  from  von  Neumann  and  Morgenstern 's  [18]  and  related  algebraic 
axiom  systems.  In  another  paper  [6]  I  attempted  to  extend  von  Neumann 
and  Morgenstern' s  algebraic  model  to  a  probabilistic  one,  and  although 
interesting  and  fairly  plausible  results  were  obtained,  it  seemed  clear 


from  a  number  of  simple  examples  that  something  was  not  quite  right. 

It  turned  out  that  if  a  utility  function  having  the  property  described 
in  theorem  h  above  also  satisfies  the  expected  utility  hypothesis,  then 
it  is  a  Fechnerian  scale  (which  seems  nice),  but  it  also  followed  that 
there  could  be  almost  no  cases  of  either  perfect  preference  discrimination 
or  perfect  likelihood  discrimination,  and  that  just  does  not  seem  to  be 
true  of  people.  A  major  question  in  my  mind  has  been  whether  to  pursue 
that  tack,  assuming  that  cases  of  apparent  perfect  discrimination 
are  not  real  but  only  result  from  the  finite  sizes  of  our  samples,  or 
whether  to  attack  the  problem  anew  assuming  that  both  perfect  and 
imperfect  discriminations  must  be  handled  within  the  same  theory. 

It  seems  as  if  a  theory  of  the  latter  type  will  be  quite  complex.  The 
following  results  suggest,  however,  that  if  axiom  1  is  true  we  have 
no  alternative. 

As  in  [6],  let  A  be  a  set  (of  pure  alternatives )  and  E  a  Boolean 
algebra  (of  events)  with  null  element  o.  A  set  3(A,E)  (of  gambles)  is 
defined  as  follows: 

i.  if  a  e  A,  then  a  e  S(A,E),  and 

if  a,b  e  S(A,E)  and  ^  e  E,  then  a»<b  c  S(A,E); 
where,  for  every  a,b  e  A  and  «<  s  E, 

ii.  a»<a  ~  a, 

iii.  aob  =  b, 

iv.  aAb  =  b»<a,  where  w,  denotes  the  complement  of  «<. 

The  symbol  a-<b  is  interpreted  as  the  gamble  in  which  a  is  the  outcome 
if  the  chance  event  •(,  (e.g.,  rain  tomorrow  in  New  York  City)  occurs 
and  b  if  it  does  not.  Condition  ii  says  that  ths  "gamble11  in  which 
a  is  the  outcome  whether  or  not  w.  occurs  is  exactly  the  same  as  the 
pure  outcome  a.  The  other  two  conditions  can  be  interpreted  equally 
simply. 
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from  a  number  of  simple  examples  that  something  was  not  quite  right. 

It  turned  out  that  if  a  utility  function  having  the  property  described 
in  theorem  b  above  also  satisfies  the  expected  utility  hypothesis,  then 
it  is  a  Fechnerian  scale  (which  seems  nice),  but  it  also  followed  that 
there  could  be  almost  no  cases  of  either  perfect  preference  discrimination 
or  perfect  likelihood  discrimination,  and  that  just  does  not  seem  to  be 
true  of  people.  A  major  question  in  my  mind  has  been  whether  to  pursue 
that  tack,  assuming  that  cases  of  apparent  perfect  discrimination 
are  not  real  but  only  result  from  the  finite  sizes  of  our  samples,  or 
whether  to  attack  the  problem  anew  assuming  that  both  perfect  and 
imperfect  discriminations  must  be  handled  within  the  same  theory. 

It  seems  as  if  a  theory  of  the  latter  type  will  be  quite  complex.  The 
following  results  suggest,  however,  that  if  axiom  1  is  true  we  have 
no  alternative. 

As  in  [6],  let  A  be  a  set  (of  pure  alternatives)  and  E  a  Boolean 
algebra  (of  events)  with  null  element  o.  A  set  S(A,E)  (of  gambles)  is 
defined  as  follows: 

i.  if  a  e  A,  then  a  e  S(A,E),  and 

if  a,b  e  S(A,E)  and  e  E,  then  a^b  e  S(A,E); 
where,  for  eveiy  a,b  e  A  and  .<  e  E, 

ii.  a=<a  "  a, 

iii.  aob  c  b, 

iv.  a«<b  =  b^a,  where  ^  denotes  the  complement  of  *<. 

The  symbol  a»<b  is  interpreted  as  the  gamble  in  which  a  is  the  outcome 
if  the  chance  event  (e.g.,  rain  tomorrow  in  New  York  City)  occurs 
and  b  if  it  does  not.  Condition  ii  says  that  the  "gamble"  in  which 
a  is  the  outcome  whether  or  not  a  occurs  is  exactly  the  same  as  the 
pure  outcome  a.  The  other  two  conditions  can  be  interpreted  equally 
simply. 
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We  shall  suppose  that  choices  from  subsets  of  S(a,E)  are 
described  by  probabilities.  The  generic  symbol  P  will  be  used,  so, 
for  example,  P(a-<b,cpd)  denotes  the  probability  that  gamble  axb  will  be 
selected  in  preference  to  cpd  when  only  these  two  gambles  are  offered. 

In  general,  utility  theory  has  confined  itself  to  pairwise  comparisons, 
but  I  shall  of  course  suppose  that  the  general  probabilities  of  section  2 
are  defined.  In  addition,  let  us  suppose  that  a  similar  set  of 
probabilities,  having  the  generic  symbol  Q,  is  defined  over  subsets 
of  the  set  E  of  events .  These  are  interpreted  as  the  probability  of 
selection  in  terms  of  (subjective)  likelihood  of  occurence.  Thus, 
for  example,  Q(*<,{3)  denotes  the  (objective)  probability  that  event  ^ 
seems  subjectively  more  likely  to  occur  than  event  (3. 

In  [61  the  following  axiom  was  introduced  as  a  possible  basic 
restriction  in  the  study  of  choice  behavior  among  gambles: 

If  a,b  e  A  and  -<,p  e  E,  then 

P(a^b,apb)  =  P(a,b)Q(^,p)  +  P(b,a  )Q(p  ,*;). 

We  say  that  P  is  decomposable  with  respect  to  Q  when  this  axiom  is  met. 

The  intuitive  grounds  for  supposing  decomposability  are  these.  In 
choosing  between  a^b  and  a{3b,  it  seems  reasonable  to  say  that  the  former 
is  preferred  to  the  latter  in  exactly  two  cases: 

i.  when  a  seems  preferable  to  b  and  -<  seems  more  likely  to  occur 
than  j3  and 

ii.  when  b  seems  preferable  to  a  and  |3  seems  more  likely 
to  occur  than  u.. 

If  we  assume  that  the  two  acts  of  discrimination  are  statistically  inde¬ 
pendent,  which  is  at  least  plausible  when  a  and  b  are  pure  alternatives., 

!  but  not  necessarily  when  they  are  tnemselves  gambles,  then  the  probability 

of  the  first  occurence  is  P(a,b)Q(^,p)  and  of  the  latter  is  P(b,a  )Q ( P , -< ) - 
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Since  i  and  ii  are  exclusive  alternatives,  adding  the  two  probabilities 
should  yield  the  probability  of  choosing  a^b  over  apb,  hence  the  axiom. 

Since  axiom  1  seems  plausible  within  any  choice  context  and 
decomposability  is  at  least  possible  within  the  context  of  gambles, 
it  would  be  interesting  to  know  their  joint  consequences.  I  shall,  however, 
confine  my  attention  only  to  some  special  results  which  indicate  an 
important  necessary  feature  of  a  theory  of  gambles. 


Theorem  8. 


)ose  that 


i.  P  is  decomposable  with  respect  to  Q, 

ii.  for  P  over  T  «  |a-<b,  apb,  aYb^-  ,  where  a,b  e  A,  axiom  1  holds 
and  P(s,t)  /  0  or  1  for  s,t  e  T, 

iii.  for  Q  over  F  =  p,Y  J  axiom  1  ho]ds  and  Q(o,t)  /  0  or  1 
for  o,t  e  F. 


Then,  either 


i.  P(a,b)  «*■  0,  l/2  or  1,  or 
ii.  lQU,p)-l/2|  +  [Q(p,  Y  ~  [Q(-<,  Y  )— 1/2 j . 

Proof.  Suppose  that  P(a,b)  ^1,  then  the  decomposition  hypothesis  i 


can  be  written 


P(a*b,a|}b)  =P(b,a)jjL  +  *  1 


From  hypothesis  ii,  we  know  (lemma  1)  that 

P(a»<b,apb )  P(apb,aYb)  =  P(gu<b,ay  b) 
P(apb,a^b"7  P(aTS  b,apb  )  =  P(~aYb,a*<b  ) 

p(a  b ) 

Setting  A  *  -  1,  these  combine  to  yield 


[1  +  AQU,p)J[l  4  AQ(p,Y  )J  _  1  +  AQU,Y  ) 

u  *  aqtmtttt  +  a«u  jrr 


Crossmultiplying  and  simplifying  yields 
A(A4l){2fQ(^,p)  4  Q(p, Y  )  -QU,*  )J  -  l} 


+  AJ[QU,p)Q(p,  Y  )Q(  Y,x)  -  QU,Y  KU 


-l*> 

From  hypothesis  iii,  we  know  that  lemma  1  holds  for  Q,  so  the  coefficient 
of  the  term  is  0;  hence  the  first  term  is  0,  wliich  yields  the 
assertion. 

For  any  a  and  b,  we  should  always  he  able  to  find  throe  events 
-<,  P,  and  V  which  aro  not  perfectly  discriminated  and  such  that  the 
three  Rambles  a^b,  apb,  and  a'tfb  also  are  not  perfectly  discriminated. 

If  so,  then  the  theorem  is  tantamount  to  saying  that  axiom  1  and 
decomposability  imply  either  that  perfect  discrimination  exists  among 
pure  alternatives  or  an  extremely  special  relation  holds  among  the 
probabilities  for  the  events.  Since  it  seems  unlikely  that  the  latter 
relation  will  be  sustained  empirically  (see  the  corollary  below),  one 
is  forced  to  conclude  that  the  pure  alternatives  must  be  perfectly  . 
discriminated.  This  seems  to  accord  with  one’s  intuitions  about  a  lot 
of  pure  alternatives  —  e.g.,  money  —  but  it  means  that  a  choice  theory 

for  gambles  is  bound  to  be  quite  complex.  I  tried  in  [6j  to  avoid 
just  that  complexity  by  denying  perfect  discrimination  among  pure 
alternatives,  and  some  of  the  results  were  a  bit  strange. 

Corollary.  Suppose  that  the  conditions  of  the  theorem  hold  for 
F  =  P,pJ  and  T  =  -[a-<b,  al<b,  apb,  a{3b  | ,  that  Q(p,p)  =  1/2 

(i.e . ,  p  has  subjective  probability  l/2),  and  that  |  <  P(a,b)  <-  1. 
Then,QU,p)  <  3/h. 

Proof.  By  theorem  8, 

QU,p)  +  Q(p,p)  -QU,p)  -  1/2 
and  QU,p)  +  Q(p,2) -QU,2)  *  1/2. 

By  theorem  2  of  [  6J,  Q(p,«<)  =Q(^,p).  Substituting  this  and  adding 

2;(x,p)  -  1  +  Q.Ut'Z)  -  Q(p,p) 

<  1  +  1  -  1/2 

=  3/2. 


the  two  equations  yields 


Even  though  I  can  cite  nc  specific  data,  this  result  seems  so 
counter-intuitive  that  one  is  forced  to  conclude  that  P(a,b)  must  equal 
0,  1/2,  or  1. 

It  will  be  recalled  that  in  [6]  a  functional  equation  involving 
just  Q  was  derived  (theorem  7)  from  a  notion  of  subjective  independence 
(definitions  7  and  8).  Using  that  equation  and  an  argument  much  like 
that  employed  in  theorem  8,  one  can  show  that,  for  any  ^  e  E  and  the 
null  event  o,  either  Q(-<,o)  -  1/2  or  1.  In  words,  any  event  must  either 
have  the  same  subjective  likelihood  as  the  null  event  or  invariably  be 
seen  as  more  likely  to  occur.  Again,  this  seems  in  accord  with  one's 
intuition  about  such  discriminations. 

Although  I  find  these  results  encouraging  as  to  the  correctness 
both  of  the  decomposability  axiom  and  of  axiom  1,  they  do  seem  to 
discourage  the  hope  that  a  simple,  elegant  probabilistic  tneory  can  be 
developed  for  the  utility  of  gambles.  The  main  problem  will  be  to  see 
in  what,  if  any,  sense  the  expected  utility  hypothesis  is  true  within 
this  framework.  One  thing  is  certain,  it  is  not  true  for  the  RSVP -scale 
of  theorem  3.  This  is  suggested  by  theorem  10  of  [6]  where,  under 
somewhat  different  conditions,  it  was  shown  that  a  utility  function 
which  is  linear,  i.e.,  satisfies  the  expected  utility  hypothesis,  has 
to  be  a  Fechnerian  scalej  and  we  can  show  it  directly.  I  shall  outline 
the  proof  without  actually  formally  stating  it.  By  v  being  linear, 
one  means  that  there  is  a  real-valued  function  <j)  on  E  such  that 
v(a^b)  =  v( a  )<J)(-< )  +  v(b  )<|)(Z<:) . 

From  theorem  2  of  [6]  we  know  that 

P(a^b,npb  )  .=•  P(a£b,  aJb). 

Substituting  the  expression  from  theorem  3  and  simplifying  we  obtain 
v(a^b  Maxb  )  =  v(a|3b  Ma{3b  ), 


so,  for  fixed  a  and  b,  v(a.<b  )v(a»<b )  is  independent  of  those  x  for  which 
the  hypothesis  of  theorem  3  holds.  But,  by  linearity, 

v(a^b  )v(aZ<b)  *  £[v(a )+v(b  )]2[  ^(^)+^G<)l|2-  [v(a  )-v(b  )-<})(«< )]j2  J 

which  is  easily  seen  to  depend  upon  »<  in  general.  So  v  cannot  be  linear. 


11.  Is  the  Basic  Axiom  True? 

No  doubt,  this  question  has  not  been  long  out  of  mind,  and 
I  suspect  that  most  readers  have  concluded  that  No,  in  general,  it  is 
not  true.  For  instance,  the  following  example,  taken  from  [81,  seems 
at  first  to  cast  doubt  upon  it. 

MA  gentleman  wandering  in  a  strange  city  at  dinner  time  chances 
upon  a  modest  restaurant  which  he  enters  uncerta inly .  The  waiter  informs 
him  that  there  is  no  menu,  but  that  this  evening  he  may  have  either 
broiled  salmon  at  $2.5>0  or  steak  at  In  a  first-rate  restaurant 

his  choice  would  have  been  steak,  but  considering  nis  unknown  surroundings 
and  the  different  prices  he  elects  the  salmon.  Soon  after,  the  waiter 
returns  from  the  kitchen,  apologises  profusely,  blaming  the  uncommunicative 
chef  for  omitting  to  tell  him  that  fried  snails  and  frog's  legs  are  also 
on  the  bill  of  fare  at  $U.50  each.  It  so  happens  that  cur  hero  detests 
them  both  and  would  always  select  salmon  in  preference  to  either,  yet 
his  response  is  'Splendid,  I'll  change  my  order  to  steak.'" 

Let  us  identify  the  following  sets  of  alternatives: 

T  =  (steak,  salmon,  snails,  frog's  legs^ 

S  =•  (steak,  salmon  1 


R 


(salmon} . 


The  narrative  certainly  does  not  specify  the  exact  probabilities  of 
choice,  but  from  the  way  it  was  phrasod  one  suspects  that  they  were  not 
far  from 

P(RjT)  -  0,  P(RjS)  >  0,  and  P(SjT)  *  1, 
in  which  case  axiom  1  fails  to  hold. 


Or  does  it?  Are  T  and  S  actually  the  sets  of  alternatives  from 
which  the  selections  were  made?  Clearly  not,  for  adding  snails  and 
frog's  legs  did  not  just  increase  the  number  of  alternatives  by  two. 

They  increased  +he  diner's  information  as  to  exactly  what  the  other  two 
alternatives  were,  and  therefore  changed  them.  At  first,  he  viewed  his 
choice  as  being  between  "moderately  expensive  steak  in  a  highly  uncertain 
restaurant"  and  "less  expensive  salmon  in  the  same  restaurant"'.  After 
the  waiter  returned,  he  was  led  to  view  it  as  being  between  "moderately 
expensive  steak  in  a  probably  good  restaurant"  and  "less  expensive  salmon 
in  the  same  restaurant".  In  other  words,  the  set  of  alternatives  was 
totally  changed.  Since  these  sets  of  alternatives  do  not  satisfy  the 
inclusion  relations  in  the  hypothesis  of  axiom  1,  there  is  no  reason 
to  expect  it  to  hold. 

Another,  and  an  important,  example  where  axiom  1  must  generally 
fail  is  when  a  decision  is  made  in  two  or  more  stages.  Let  us  consider 
the  two-stage  case.  A  choice  from  T  must  be  made.  Suppose  the  subject 
partitions  T  into  non-overlapping  subsets  T^,  Tg,...,T^,  and  first 
chooses  one  of  these  subsets  and  then  an  alternative  from  the  chosen 
subset.  It  is  easy  to  show  that  if  axiom  1  holds  for  both  component 
decisions,  then  it  cannot  usually  hold  for  the  overall  choice  from  T. 

Of  course,  in  general,  it  is  only  the  overall  choice  that  can  be  observed. 
From  introspective  and  anecdotal  evidence,  one  is  reasonably  certain 
that  people  sometimes  decompose  decisions  in  this  manner,  and  it  is 
tempting  to  suppose  that  animals  do  so  too;  however,  there  have  been 
no  methods  for  determining  the  categorizations  used  by  non-humans, 
and  so  many  important  questions  about  the  structuring  of  alternatives 
have  remained  unanswered. 

One  suggestion  arises  from  this  theory  if  the  nf  ] 
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ts  accepted  for  the  component,  decisions.  As  stated  above,  axiom  1 
does  not  hold  in  general  for  a  two-stage  decision,  but  it  is  easy  to  show 
that  the  equations 

P(RjT)  -  P(R;T.  )P(T  ;T,' 

do  hold  for  the  subsets  forming  the  categorization.  Thus,  one  may 
propose  to  use  these  equations  to  detect  the  categorizations  from 
behavioral  data.  But  this  is  an  aside;  I  merely  wanted  to  point  out 
that  data  from  such  multi-stage  decisions  will  lead  to  a  rejection  of 
axiom  1  if  it  is  naively  interpreted. 

The  point  in  both  examples  is,  of  course,  that  no  general 
definition  of  the  concept  of  an  alternative  has  been  given  in  the  behavioral 
sciences.  Each  experimenter  must  make  his  own  identification  of 
alternatives  for  a  specific  empirical  context  in  terms  of  that  context. 

What  we  sorely  need  is  a  general  theory  of  alternatives.  To  be  sure, 
even  wilhuut  it,  there  is  a  remarkable  amount  of  agreement  as  to  what 
the  alternatives  are  in  simple  situations;  nonetheless,  there  have  been 
many  disagreements  in  the  past.  The  question  I  raise  is  this:  are  there 
any  apparent  violations  of  axiom  1  that  cannot  be  eliminated  by  a  suitable 
and  intuitively  acceptable  redefinition  of  the  underlying  set  of 
alternatives? 

If  one  is  willing  to  conjecture  the  answer  No,  then  we  are  led 
into  an  extremely  subtle  problem  of  scientific  methodology  and  philosophy. 

If  one  can  always  save  axiom  1  by  a  redefinition  of  the  alternatives,  then 
one  is  led  to  suggest  that  axiom  1  be  accepted  as  correct  and  that  it  be 
used  to  determine  the  alternatives.  Clearly,  this  is  close  to  a  form 
of  insanity  in  which  truth  is  by  fiat,  for  why  not  choose  any  other 
relationship  and  set  it  up  as  a  law,  insisting  that  it  is  always  correct 
and  that  other  concepts  must  be  changed  to  make  it  true.  Close  though 


-*a- 


it  may  bo,  it  la  not  always  lnaftae  prwvidad  that  certain  other  conditions 
are  met.  Certainly  it  has  been  done  trm  tifce  to  tin*  in  physics  with 
great  success,  and  some  philosophers  have  fait  themselves  forced  to  the 
position  that  seme  laws  (e.g.,  eonstrvstion  of  energy)  both  poaaess 
empirical  content  and,  at  the  sane  time,  serve  as  organizing  principles 
which  suggest  appropriate  definitions  in  new  areas  of  application. 

The  question,  then,  be  coins  s  the  conditions  under  which  it  is 
not  insane  to  make  a  statement  of  the  form  "for  any  choice  situation, 
there  exists  a  definition  of  the  alternatives  such  that  axiom  1  is  true." 

I  do  not  know  whether  philosophers  have  evolved  such  a  list  of  conditions, 
but,  as  a  practicing  scientist,  I  think  the  following  list  will  prove  to  be 
minimal  in  this  case.  First,  for  a  wide  variety  of  situations  the  axiom 
will  have  to  be  verified  for  carefully  thought  out,  but  independently 
given,  definitions  of  the  alternatives.  By  and  large,  these  will  probably 
be  relatively  simple  situations.  Second,  in  cases  ’.There  the  axiom  appears 
to  be  violated,  the  required  redefinition  generally  will  have  to  result 
in  intuitively  acceptable  insights  into  behuvior.  In  many  cases,  one  would 
expect  the  reaction  "Of  course,  how  did  I  niss  thatl"  Third,  the  forced 
redefinition  of  the  alternatives  will  have  to  be  comparatively  simple. 
Fourth,  the  axiom  will  have  to  have  such  rich  and  useful  consequences 
in  all  fields  of  choice  behavior  when  coupled  with  their  particular  laws 
that  more  will  be  lost  by  rejecting  it  than  by  keeping  it.  Put  another 
way,  it  will  have  to  be  compatible  with,  or  explain,  the  laws  that  have 
been  established  in  special  fields,  and  together  they  will  have  to 
explain  a  great  deal  of  observed  behavior. 

In  the  preceding  sections  I  have  tried  to  show  that  axiom  l's 
range  of  application  is  fairly  broad,  but  it  still  remains  to  see 
Just  how  deep  it.  actually  goes.  For  example,  does  the  ^'.lgpnsted 
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modif  icai.  ion  of  stochastic  learning  theory  actually  account  for 
appreciably  more  learning  phenomena  than  previous  theories  have?  At 
present,  the  most  one  can  3ay  is  that  axiom  1  shows  some  promising, 
but  highly  inconclusive^  symptoms  of  being  a  general  law  of  choice  behavior. 
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